Practical 0-1: Functions and loops

Overview

In this practical you will learn a few basics of writing a function to wrap up a snippet of code you will use again, and a for loop that allows you to do the same work multiple times.

Introduction

If you need to use the same lines of code more than once, then you probably need to do one of two things:

  1. If you need to do the same thing multiple times as you solve different problems, then you need to write a function. Functions are a fundamental building block of nearly all programming languages. You use them all of the time, from plotting figures to building linear models, and it’s not that much harder to write entirely new ones from scratch than it is to string a couple together to achieve a more specific goal than any existing function can do on its own.
  2. If you need to do the same thing over and over again in one place, then you need to write a loop. The simplest kind of loop is a for loop, that says that you need to do a specific series of instructions an exact number of times.

It may well be that you need to do both (write functions that you then use inside loops) - in fact you will very soon - but we’ll worry about that in a later practical.

We write both functions and loops to avoid having to copy and paste the same bit of code over and over again into multiple different places. As well as being deeply inelegant, this is very prone to error when you copy something slightly wrong - you miss off a line or a letter, for instance. It is even more annoying that if you find there is something wrong with the code you’re copying, you have to track down everywhere you copied it to and change it in all of these places. Almost nobody ever succeeds in doing this right, and writing functions and loops is therefore one of the easiest ways that you can start to make your code less likely to have mistakes in it, and therefore make the results of your work more likely to be reproducible.

When you are using functions and loops correctly, you should only have to fix the code in one place, and it will automatically be fixed “everywhere”, because the code is actually only written once, it’s just referenced elsewhere.

Functions

It’s very easy to write a function. Here we write a single function (called function_name), that takes one input (we call the inputs to functions arguments, here there is just one called argument). Finally, whatever appears last in the function definition will be returned to script that calls it, in this case, result.

Every function has one “line” of code after it (comments are ignored) that defines what the function does. Because nearly all functions need multiple lines to define what they do, we nearly always matching curly brackets - { ... } - to allow us to write multiple commands that are all treated as one line by R:

function_name <- function(argument) {
  # Function body
  
  # ... exciting code using argument
  #     to do something useful, and
  #     producing some result ...
  result
}

Above, we just have dummy code, and it doesn’t work. Note that the function body, the code that gets run when you use the function is wrapped in curly brackets { ... } - this makes it possible for the computer to know what is supposed to be in the function and what isn’t.

Now we can write a couple of very simple functions though that really do something (not very exciting!):

add_one <- function(number) {
  number + 1
}

add_one(10)

Try running the code block by clicking on RUN CODE. Can you see what it does yet? Feel free to change the argument from 10 to a different value to make sure it works as expected. And change what the function does if you like - when you do, change the name though, so it makes sense… it’s really important for your functions to have good names so it’s easy to tell if you’re using them right. Note that because add_one(10) is after you close the curly bracket, it’s not in the function, it’s in your main script.

The last (and indeed only) line of code in this function is what is returned by the function, here just the number plus one. It’s critical that the last line of your function has the value you want to return in it. Normally we need to do something more complex in our function - here we make a calculation and then return it on the last line (possibly the second most simple thing we could do):

add_one <- function(number) {
  one.more <- number + 1
  one.more
}

add_one(10)

Check that this does the same job. Example 1 may seem simpler. However, it’s usually helpful when writing functions to name the thing that you calculate to make it easier to keep track of what’s going on. You can then return the variable (in Example 2, one.more) that you have just calculated. In Example 2 you can see something very important about functions for the first time. Try running the next example:

add_one <- function(number) {
  one.more <- number + 1
  one.more
}

add_one(10)
one.more

You’ll see that it gives an error. That is because, although you have calculated one.more inside the function, it is discarded when the function ends. This is because variables defined inside functions are local to the function. For there to be a variable in your main script called one.more, you need to define it there. Try adding a line of code between lines 5 and 6 of Exercise 1 that just says:

one.more <- 20

Note that the global one.more variable (the one in your main script) is not affected by calling the add_one() function even though it looks like it is updated there.

Naming of things

Dots and underscores

You may have noticed that my functions were called things like add_one (with an underscore), but my variables had names like one.more (with a dot). This was deliberate. Variables and functions should be made up of one or more words (or abbreviations) to make it easy to understand what they do. However, it can be hard to read names just stuck together like addtwonumberstogether, whereas add_two_numbers_together is easier. However, there are boring reasons to do with how R works why it’s a bad idea to write R functions with dots in their names (even though many functions in R already have dots). As a result, in these practicals, I insist that you don’t uses dots in function names when you write them.

However, dots are generally okay in variables names, so to make it easy to distinguish normal variables from functions, I choose to put dots in variable names (as one.more). You can do whatever you like.

Here’s a function that takes two arguments and adds them together (yes, it is just reinventing +!):

add.up <- function(first, second) {
  added_together <- first + second
  added_together
}

add.up(10, 30)

Rename the function and variables in Exercise 2 so that it satisfies my naming rules.

There is much more on how to style your code to make it more consistent here. Hadley recommends never using dots, so using underscores to separate words in functions and variables. If you want to do that it’s fine. Don’t use dots in function names though. He has lots of other suggestions about how to write clear code - do have a look at it and try to follow it if you have time.

Not reusing names

In Exercise 1, we showed that variables used in functions don’t affect what happens outside the function. Unfortunately, the reverse isn’t true. Look at this function:

missing_add <- function(first) {
  first + second
}

missing_add(10)

This function was intended to be like add_up() in Exercise 2, but I accidentally forgot to add the second argument. Fortunately if you run it, it will give an error (try it). However, what happens if you have defined a variable called second elsewhere in your script? Try adding the following at line 4:

second <- 1

missing_add(10) should no longer give an error, and return 11. Try changing second to 2 - missing_add(10) should now return 12. That’s not good news - what the function does depends on what else you may have done in your script. When you call a function it should always do the same if you give it the same arguments.

This gets even worse when you account for our ability to misspell things:

misspelled_add <- function(first, second) {
  resuIt <- first + second
  result
}

result <- -5
misspelled_add(10, 20)
misspelled_add(1, 2)

Hint: If you can’t see the problem see how the word result is spelled in different places.

You should find that misspelled_add() gives the wrong answer even though the code looks right (if you don’t look too closely). This problem can (and does!) cause hours of agony to some students every year on this course. As a result, we insist on two things in your submitted practicals:

  1. You use different variable names in your functions than in your scripts, wherever possible. Sometimes there really only is one sensible name for a variable (maybe timestep when describing the increments in your simulation model), and you should be very careful when that happens, but normally there are at least two ways of describing the same idea - often one more general in a function (birth.rate) which can be used in multiple contexts, and then specific in your main script each time you use it (human.annual.birth.rate, for instance).
  2. You write checks for your functions to make sure they aren’t accidentally missing any variables or arguments. There are two tools for this, findGlobals() and checkUsage() in the codetools library.

Try adding the following lines to the end of Exercise 3:

library(codetools)
checkUsage(misspelled_add)

You’ll see that it reports on the problem that you have probably already identified. It can also identify other problems, and we recommend it to you. However, it doesn’t return anything you can use in your code to allow us to automatically reject bad functions, and we’ll show you how to do this in a later practical. For now just try using this line instead of checkUsage() above (the library() call is the same):

findGlobals(misspelled_add, merge = FALSE)

You’ll see that it identifies the use of the global variable result too (if less elegantly). We’ll show you how to use this to automatically detect problems with variable misnaming in a later exercise, but for now just remember that you need to watch out for misspelling very carefully.

If you’re interested, the findGlobals() output also tells you why this problem exists in R. You’ll see that the function misspelled_add() needs to use three other global* variables - the functions {, + and <- - to get the function to work. As a result of the way R works, functions have to be able to see what is going on outside to be able to work at all. Most languages handle this more elegantly.*

Loops

Loops allow you to do the same thing multiple times easily. You loop over the same bit of code again and again instead of having to copy and paste the code multiple times into your script. We’ll talk more about different kinds of loops in a later practical, but for now we’ll focus on the simplest – something that tells you to do something a fixed number of times that you know in advance. This is called a for loop.

A for loop always has the same style:

for (one in many) {
  # Do something
  
  # ... exciting code maybe using the variable
  #     one (but it doesn't have to) ...
}

It looks a bit like a function, but what it does is set the variable one to each of the values in many one at a time, and then does something. Here are two very simple examples to get an idea of what it might do:

words <- c("Hello", "Goodbye")
for (word in words) {
  print(word)
}

Try changing the words variable. You’ll see that the loop runs once for each word in the words vector. We can use this to do something a fixed number of times without necessarily caring about the entries:

# Make a sequence of 7 numbers from 1 to 7
times <- seq(from = 1, to = 7) # This is the same as 1:7 or seq(1, 7)

# Print "I'm in a loop" repeatedly
for (iteration in times) {
  print("I'm in a loop")
}

Change the length of the times vector. You’ll see that it prints out the text once for each element in times even though it never looks to see what iteration is (it doesn’t care).

Now look at this code:

# A function to add one to a number
add_one <- function(number)
  number + 1

# Initialise the value
current.value <- 0

current.value <- add_one(current.value)
current.value <- add_one(current.value)
current.value <- add_one(current.value)
current.value <- add_one(current.value)
current.value <- add_one(current.value)

# Output the updated current.value at the end to see what happened
current.value

It just runs our add_one() function from earlier five times to update the variable current.value to 5. Change the code so it does this ten times to produce 10.

Now look at this next piece of code:

# A function to add one to a number
add_one <- function(number) {
  number + 1
}

# Initialise the value
current.value <- 0

# This now is a loop!
for (loop in 1:10) {
  current.value <- add_one(current.value)
}

# Output the updated current.value at the end to see what happened
current.value

It does the same thing - it calls add_one() ten times, but it’s much shorter, which generally makes it easier to see what’s going on. Now change the code to call add_one() 100 times. Hopefully you can see that this is really starting to help now!

And you can probably imagine that if it was any more complicated that just copying and pasting one line it would make it much easier to see that it was doing exactly the same thing each time too.

Finally, add in the single line print(paste("Loop number", loop)) into the loop at the end (between lines 11 and 12) and run it. You’ll now be able see each step of the loop running. We’ll use techniques like this later to check that the loop is doing the right thing.

Note also that unlike functions, variables changed in for loops do change what happens outside the loop, so current.value is 100 at the end, and not still 0.

Conclusion

You’ve now learnt the basics of functions and loops. You’ll be using them throughout the rest of the practicals, so do ask for help if anything isn’t clear.